Keyword [CPN]
Chen Y, Wang Z, Peng Y, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7103-7112.
1. Overview
Challenge cases of multi-person pose estimation, such as
- occluded keypoints
- invisible ketpoints
- complex background
In this paper, it proposed Cascaded Pyramid Network (CPN)
- GlobalNet. localize simple keypoint
- RefineNet. explicitly handle hard keypoint (online hard keypoints mining)
- Top-down pipeline. generate human box based on detector first
1.1. Contribution
- CPN
- Explore the effects of various factors in top-down pipeline
1.2. Related Works
- Classical. pictorial structure, graphical model, tree structure and hand-crafted feature
- Multi-Person. top-down and bottom-up
- Single-Person. regressors, heatmap and score map
- Human Detection. one stage and two stages
1.3. Dataset&Metrics
- MS COCO. trainval (57k images and 150k person instances), minival (5k images), test-dev (20k) and test-challenge (20k).
- OKS-based mAP. (object keypoints similarity)
2. Architecture
2.1. GlobalNet
- Top-Down: C2, C3, C4, C5.
- C2,C3. High spatial resolution for localization but low semantic information for recognition
- C4,C5. More semantic information but low spatial resolution
- Drawbacks: the hard keypoint requires more context rather than the appearance feature nearby
2.2. RefineNet
- Stack more bottleneck blocks in deeper layers (small spatial)
- explicitly select the hard keypoint online (top M) based on training loss and BP the loss from the them.
3. Experiments
3.1. Data Process
- box 256:192 and resize to 256x192
- flip, rotation (-40~+40), scale (0.7~1.3)
3.2. Test
- ensemble mechanism
3.3. Ablation Study
3.3.1. NMS strategy
soft-NMS surpasses hard-NMS.
3.3.2. Detector Performance
AP less important for pose estimation.
3.3.3. Hard Keypoints Number
M = 8 works well.
3.3.4. With\Without
3.3.5. Concatenation
3.3.6. Dilation
Dilation increase AP and FLOPs.